This document describes the analysis of colorectal cancer and normal samples grown in 3D culture (tumouroid and organoid, respectively). The aim for this analysis was to investigate mutations across samples using exome-sequencing and cutting-edge bioinformatics tools.
Samples were subjected to rigourous QC and then analysed for high confidence mutations using the cgpWXS (CAVeMAN and Pindel) and GATK-mutect2 variant callers. The cancer genome interpeter tool was used to annotate variants which allowed the identification of a number of driver (TIER1/2) mutations. A number of these annotated driver mutations showed a high variant allele frequency in cancer samplea only indicating that these mutation are indeed being clonally selected for and therefore likely to be facilitating tumorigenicity.
FASTQ files were first trimmed for TruSeq adapter sequences (AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG) and FastQC analysis (Figure 1). Samples were then aligned using the docker image cgpMAP (https://github.com/cancerit/dockstore-cgpmap) which utilises bwa-mem - genome GRCh37d5.
BAM files were sorted using SAMtools and then analysed using Picard tools for QC. QC analysis included remove duplicates erroneously generated through PCR and sequencing (optical duplicates). Exome-seq capture of targets were analysed using HS.metrics. A summary of the tools used is shown below:
Sanger cgpWXS and GATK-mutect2 were both indepdently run on the samples. The docker container cgpWXS (https://github.com/cancerit/dockstore-cgpwxs) utilises CAVeMAN and Pindel to identify single base substitutions (SBS) and indels, respectively. An in-house custom pipeline was developed to automate the reccomennded tools from the GATK mutect2 workflow (https://gatk.broadinstitute.org/hc/en-us/articles/360035531132--How-to-Call-somatic-mutations-using-GATK4-Mutect2)
VCF files from both analyses were overlapped for SBS and indels, using bcftools and bedtools respectively, giving a consensus of variants identified using both methods. VCF files were annotated using the cancer genome interpreter tool (CGI - https://www.cancergenomeinterpreter.org/home).
Below shows the overlap between the two variant callers used.
The cancer genome interpreter is a tool that enables the annotation of variants, giving predictions of deleteriousness, prediction of tumour drivers and a list of published drugs that may be effective for samples.
Below shows variant types across all samples that have also been annotated based on known cancer driver genes (variant types tab); tier 1 and 2 represent higher and lower level of stringency of the driver prediction, respectively. Data is accessible using the CGI datatable tab - export of table is supported by selection of CSV, excel or pdf buttons. For TIER1/2 driver genes a table of published drugs known to be effective against varianst are given (CGI drug prediction tab).
Variant (mutant) allele frequency, VAF, are used commonly in cancer samples to estimate the intratumoral heterogeneity in samples. VAF shows the proportion of a particular variant a given sample. Mutations in cancer occur frequently, many of which most likely have no / little effect on tumoriogenecity - termed passenger mutations. As these mutations are not selected for you would expect these genes to have low VAF (around 0). In contrast, if mutations occur in driver genes you would expect these mutations to be selected for and to drive clonal growth.
If the clones are completely pure, you would expect passenger mutation to have a VAF < 0.5 and driver mutation VAF > 0.5. Most cancers, however, are heterogeneous and change from sample to sample and is therefore difficult to select such a hardline cutoff value. Visualising the VAF for samples can help inform the selection of driver mutations.
The figure below (top) shows the allele frequency for all high confidence variants in ‘sample1’ identified in tumoroid samples and not in normal organoid sample. TIER1/2 mutations identified by CGI tend to have high VAF values suggesting variants within these genes are being selected for and most likely driver mutation. The TIER1/2 gene names and their allele frequencies are shown in bottom panel.
In summary, this analysis identified a number of consensus variants that were analysed using two independent software - GATK mutect2 and cgpWXS. For turmor samples, CGI identified that a number of these genes as putative driver mutations. In support of this, these genes showed a high VAF (>0.5) suggesting these genes have been selected for and are indeed driving tumorigenicity.